De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference
نویسندگان
چکیده
Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom.
منابع مشابه
PPARgamma in adipocyte differentiation - a ChIP-Seq case study
• peak finding and analysis for known transcription factor binding sites, • definition of de novo binding site matrices from cluster sequences, • identification and analysis of potential target genes including associated pathways, • promoter analysis and identification of a common regulatory framework in a gene subset and subsequent scan of all annotated promoters for matches for this framework...
متن کاملMotif discovery and transcription factor binding sites before and after the next-generation sequencing era
Motif discovery has been one of the most widely studied problems in bioinformatics ever since genomic and protein sequences have been available. In particular, its application to the de novo prediction of putative over-represented transcription factor binding sites in nucleotide sequences has been, and still is, one of the most challenging flavors of the problem. Recently, novel experimental te...
متن کاملDiscovery of transcription factor binding sites through integration of generic motif finders
Locating transcription factor binding sites is a key step in understanding gene regulation. Due to its importance, many de novo motif finding methods (e.g. MEME, MotifSampler, Mitra and Weeder) have been proposed. Individually, these motif finders perform unimpressively overall based on Tompa’s benchmark datasets. Moreover, these motif finders vary in their definitions of what constitute a moti...
متن کاملcsaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows.
Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify binding sites for a target protein in the genome. An important scientific application is to identify changes in protein binding between different treatment conditions, i.e. to detect differential binding. This can reveal potential mechanisms through which changes in binding may contribute to t...
متن کاملPriorsEditor: a tool for the creation and use of positional priors in motif discovery
SUMMARY Computational methods designed to discover transcription factor binding sites in DNA sequences often have a tendency to make a lot of false predictions. One way to improve accuracy in motif discovery is to rely on positional priors to focus the search to parts of a sequence that are considered more likely to contain functional binding sites. We present here a program called PriorsEditor...
متن کامل